optimal model averaging
Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces
Yuan, Chaoxia, Ying, Chao, Yu, Zhou, Fang, Fang
Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.
Optimal Model Averaging: Towards Personalized Collaborative Learning
Grimberg, Felix, Hartley, Mary-Anne, Karimireddy, Sai P., Jaggi, Martin
In federated learning, differences in the data or objectives between the participating nodes motivate approaches to train a personalized machine learning model for each node. One such approach is weighted averaging between a locally trained model and the global model. In this theoretical work, we study weighted model averaging for arbitrary scalar mean estimation problems under minimal assumptions on the distributions. In a variant of the bias-variance trade-off, we find that there is always some positive amount of model averaging that reduces the expected squared error compared to the local model, provided only that the local model has a non-zero variance. Further, we quantify the (possibly negative) benefit of weighted model averaging as a function of the weight used and the optimal weight. Taken together, this work formalizes an approach to quantify the value of personalization in collaborative learning and provides a framework for future research to test the findings in multivariate parameter estimation and under a range of assumptions.